Skip to content

Update VerifyDevicePath logic to run udevadm --trigger to fix scenarios where disk is not found or is wrong device #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 4, 2020

Conversation

davidz627
Copy link
Contributor

@davidz627 davidz627 commented Jan 28, 2020

TODO:

  • Write an E2E test that tests this functionality: attaches disk -> purposely links the /dev/by-id/{diskName} to the wrong device or deletes the /dev/by-id/{diskName} entirely then make sure NodeStage is able to self-heal that using udevadm --trigger

/kind bug
/assign @msau42
Fixes #376

/lib/udev/scsi_id was pointing to the binary on the host due to the /lib/udev hostPath. This was causing weird package dynamically linked library finding issues (couldn't find the right version of dynamic libs). The solution is to use the /lib/udev/scsi_id from the container by copying it into a special directory. Thanks @msau42 for pointing out that /lib/udev was coming from the host

ig9OoyenpxqdCQyABmOQBZDI0duHk2QZZmWg2Hxd4ro

During NodeStageVolume run udevadm --trigger to fix device symlinks if device path is not found or device path points to the wrong device

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 28, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidz627

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2020
@k8s-ci-robot k8s-ci-robot requested review from msau42 and verult January 28, 2020 02:39
@davidz627
Copy link
Contributor Author

Wrote the e2e test and confirmed its failing before the udevadm logic fix, and passes afterwards :)

@@ -49,8 +49,8 @@ func GCEClientAndDriverSetup(instance *remote.InstanceInfo) (*remote.TestContext
endpoint := fmt.Sprintf("tcp://localhost:%s", port)

workspace := remote.NewWorkspaceDir("gce-pd-e2e-")
driverRunCmd := fmt.Sprintf("sh -c '/usr/bin/nohup %s/gce-pd-csi-driver --endpoint=%s> %s/prog.out 2> %s/prog.err < /dev/null &'",
workspace, endpoint, workspace, workspace)
driverRunCmd := fmt.Sprintf("sh -c '/usr/bin/nohup %s/gce-pd-csi-driver -v=4 --endpoint=%s 2> %s/prog.out < /dev/null > /dev/null &'",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small fix to improve debugging experience. This is on purpose

@@ -28,10 +28,6 @@ import (
mountmanager "sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/pkg/mount-manager"
)

func init() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had two init functions before, I don't know if that was causing issues but it doesn't hurt to fix

@davidz627 davidz627 force-pushed the fix/udev branch 2 times, most recently from 4bd3d05 to a2c0480 Compare January 29, 2020 21:27
// candidate devicePaths or an empty string if none is found. If
// /lib/udev/scsi_id exists it will attempt to fix any issues caused by missing
// paths or mismatched devices by running a udevadm --trigger.
func (m *deviceUtils) VerifyDevicePath(devicePaths []string, diskName string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need devicePaths as an argument, or can we generate it inside this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't really have to, but then we would need to add partition as an argument. (I can follow up but probably out of scope for this PR)

@davidz627 davidz627 force-pushed the fix/udev branch 3 times, most recently from 7ad145d to ea9fc37 Compare February 4, 2020 00:49
}

// ValidateLogicalLinkIsDisk takes a symlink location at "link" and finds the
// link location - it then finds the backing PD using scsi_id and validates that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, although when NVMe comes up, this test will get outdated. I think writing data and verifying it would be more portable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is writing data would work if the symlink was made incorrectly, so I'm not sure it validates what we care about. (Which is that the volume matches the device). Writing this check was much harder than writing data :)

I don't think network attached PD supports NVME. Only available for local ssd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's no NVME for PD now, I'm just saying for future proofing.

…os where disk is not found or is wrong device
@msau42
Copy link
Contributor

msau42 commented Feb 4, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2020
@k8s-ci-robot k8s-ci-robot merged commit e4b7960 into kubernetes-sigs:master Feb 4, 2020
@davidz627 davidz627 deleted the fix/udev branch February 4, 2020 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

udevadm may be never triggered right now
3 participants